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INTRODUCTION 


Modern  weapons  systems  often  require  pattern  recognition  devices  and 
algorithms.  The  three-layer  neural  network — consisting  of  input  layer,  hidden 
layer,  and  output  layer — has  great  potential  as  a  pattern  classifier.  Determining 
the  size  of  the  hidden  layer  and  computing  the  weights  are  two  of  the  primary 
design  problems  associated  with  layered  networks.  This  report  considers  only 
piecewise  linear  networks;  that  is,  networks  with  piecewise  linear  transfer 
functions.  General  introductions  to  layered  networks  appear  in  References  1  and 
2. 
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Using  linear  algebraic  methods,  we  determine  a  lower  bound  on  the  number 
of  hidden  neurons  as  a  function  of  the  input  and  output  dimensions  and  of  the 
number  of  pattern  prototypes.  Appropriate  weights  for  matching  the  desired 
input/output  pairs  are  constructed. 

The  network  of  interest  is  called  a  (d,L,m)  network,  where 

d  =  dimension  of  the  input  space  (number  of  input  nodes) 

L  =  dimension  of  the  intermediate  space  (number  of  hidden  neurons) 

m  =  dimension  of  the  output  space  (number  of  output  nodes). 

The  network  transfer  function  is  denoted  Fw .  where  W  is  the  'weight  vector.' 
Actually,  W  consists  of  two  matrices  A  and  B  and  two  vectors  a  and  b. 

A  is  L  by  d,  a  is  L  by  1 

B  is  m  by  L,  b  is  m  by  1 

A+(x)  =  Ax  +  a  for  x  e  RW) 

B+(u)  =  Bu  +  b  for  u  e 
A+  .  R(d) - »  R(L) 


B+  :  r(L)  - >  R(m), 

A+  and  B+  are  affine  transformations,  while  A  a.ul  E  arc  linear.  The  weight 
vector  W  can  be  considered  a  member  of  R(w)f  where 
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w  =  dL  +  L  +  Lm  +  m  , 

the  total  number  of  entries  (parameters)  in  A,  a,  B,  and  b.  A  typical  weight 
initialization  task  involves  finding  a  W  e  for  which 

Fw(xj)  =  yjforl<j<N, 

where  the  (xj,  yj)'s  are  desired  input/output  pairs. 

In  Reference  2  it  is  shown  that  one  cannot  find  such  a  W  for  'general'  families 
of  N  input/output  pairs  when 

N  >  w/m  =  L  +  1  +  (dL  +  L)/m  .  ( 1 ) 

For  example,  the  number  of  general  pairs  for  which  the  solution  vector  W  exists 
in  a  (20,  30,  5)  network  is  at  most  157.  The  Inequality  1  holds  for  general 
sigmoidal  neuron  transfer  functions. 

This  'dimensional  bound'  on  the  number  of  general  input/output  pairs  that  a 
layered  network  can  accommodate  follows  from  comparing  the  space  of  output 

sets  with  the  weight  space.  The  space  of  all  possible  N-tuples  Y  =  (yj .  y2 . yN) 

of  desired  outputs  has  dimension  mN.  For  a  fixed  N-tuple  X  =  (xi,  X2 . xn)  of 

inputs,  the  set  S(X)  =  (F\y(X)  :  W  e  R<w)}  of  all  output  images  of  X  must  cover  the 
mN -dimensional  space  of  possible  outputs.  In  the  presence  of  appropriate 
hypotheses  regarding  the  neuron  transfer  function  it  follows  that 

w  >  mN  . 

That  is,  the  weight  space,  which  is  the  domain  of  the  function 
F(X)  :  W  Fw(X),  must  have  dimension  at  least  as  great  as  that  of  the  range, 
R(mN>,  which  is  the  space  of  N-sets  of  outputs.  It  follows  that 

N  <  w/m  . 

It  is  shown  in  Reference  3  that  the  (d,L,m)  network  can  accommodate  at  least 
d+1  input/output  pairs  when  d  >.  L  >  m  and  at  least  L+l  pairs  in  any  case.  For  L  > 
3m/2,  the  algorithm  described  in  Section  3  allows  one  to  raise  the  lower  bound 
on  the  number  of  input/output  pairs  to  d  +  L  -  1. 
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BACKGROUND  AND  NOTATION 


The  piecewise  linear  neuron  transfer  function  p  is  defined  by 

-1  for  t  <  -1 

p(t)=  t  for  -1  <  t  <  1  • 

1  for  let 

The  network  transfer  function  satisfies 
F w(x)  =  B  p(L)  (Ax  +  a)  +  b  . 

Equivalently, 

F\y  =  B+  o  p(L)  o  A+  , 


where  p(L)  is  the  extension  of  p  to  R(l), 

p(L)(u)  -  (p(ui),  p(u2) . p(ul))T  . 

u  =  (ui,  U2,  ....  ul)T  , 

and  o  denotes  function  composition. 

The  following  definitions  from  affine  and  linear  geometry  are  useful. 

Affine  space.  H  is  a  k-dimensional  affine  subspace  of  R(d)  provided 
H  =  Ho  +  a,  where  a  e  R(d)  and  Ho  is  a  k-dimensional  linear  subspace  of  R<d); 
i.e.,  H  must  be  a  translate  of  a  k-dimensional  linear  subspace. 

Affine  equivalencs.  Two  ordered  k-subsets  X  and  Z  of  R(d>  are  affine 

equivalent  provided  there  is  an  invertible  affine  mapping 

A+  :  R(d)  -»  R(d) 

for  which  A+(xj)  =  A+(zj)  for  1  <  j  <  k. 

Affine  generation.  For  a  finite  subset, 

X  =  (xi.  X2,  ....  xk} 

of  RW),  the  affine  subspace  generated  by  X,  denoted  «X»,  is  defined  by 

«X»  =  {  X  (XjX: :  X  cl  =  1 }  . 
j  j  j  j 
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General  position.  A  subset  X  of  R(d)  is  in  general  position  provided  every 
k-subset  S  of  X  generates  a  (k-l)-dimensional  affine  subspace  of  R<d)  for 
1  <_  k  <,  d+  1  . 

Convexity.  A  subset  C  of  R(d)  is  convex  provided  Xx  +  (1  -  X)  y  e  C  whenever  x, 
y  €  C,  and  0  <  X<  1.  Equivalently,  a  set  is  convex  if  it  contains  every  line  segment 
joining  two  of  its  members. 

REMARKS.  The  affine  subspace  of  R<d)  generated  by  the  set  X  =  (xj,  X2,  ....  xk}  is 

also  equal  to  <  Xo  >  +  xk,  where  Xo  =  {xj ,  xk,  X2  -  xk .  xk-i  -  Xk)  and  <  Xo  > 

denotes  the  linear  subspace  generated  by  Xo- 

The  close  relationship  between  convexity  and  affine  generation  should  be 
noted.  The  convex  hull  of  a  subset  X  of  R(d)  is  defined  by 

Hull(X)  =  {  lotjX: :  I  a,  =  1  and  all  as  >  0}  . 

j  j  j  j  j  j 

Thus,  the  convex  hull  is  that  part  of  the  affine  subspace  determined  by  non¬ 
negative  coefficients.  The  affine  subspace  generated  by  X  is  the  intersection  of 
all  affine  subspaces  containing  X.  Similarly,  the  convex  hull  of  X  is  the 
intersection  of  all  convex  sets  containing  X. 

The  importance  of  affine  geometry  and  affine  equivalence  stems  from  the  fact 
that  each  layer  of  weights  defines  an  affine  transformation.  Thus,  if  there  exist 
weights  that  map  the  inputs  X  =  {xj}  into  the  outputs  Y  =  {yj } ,  then  there  also  exist 
weights  for  the  pairs  in  X’,  Y'  whenever  X'  is  affine  equivalent  to  X  and  Y'  is  affine 
equivalent  to  Y. 

A  basic  result  from  convexity  theory,  which  we  state  here  without  proof,  is 

RADON’S  THEOREM.  If  X  is  a  set  of  d+2  points  in  general  position  in  R<d), 
then  there  is  a  unique  partition  X  =  S  U  T  of  X  into  two  proper,  disjoint  subsets 
such  that  Hull(S)  P>  Hull(T)  *  <J>. 


PROJECTIONS  OF  FINITE  SETS 


The  network  transfer  function  Fw  of  a  (d.L.m)  network  consists  of  two  affine 
mappings  with  a  piecewise  linear  truncation  (or  squash)  in  between.  The  output 
of  the  hidden  layer  is  the  result  of  the  affine  mapping  x  ->  Ax  +  a  followed  by  the 
L-dimensiona!  truncation  p(L).  Thus,  the  ith  coordinate  u(’)(x)  of  the 
intermediate  output  is  given  by 


uW(x)  =  p(Ai  x  +  aj)  , 
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where  A;  is  the  ilh  row  of  A  and  a;  is  the  ith  coordinate  of  a.  The  mapping  : 
R(d)  -»  R  is  constant  in  each  of  the  half-spaces  D.  and  D+,  where 

D.  =  {x  :  Aj  x  +  aj  <  -1 } 

D+  =  {x  :  A;  x  +  aj  >  1 }  . 

In  the  infinite  strip  Do  =  R^dK(D.  U  D+), 

u(')(x)  =  Ai  x  -i  a;  . 

Thus,  the  mapping  uW  is  piecewise  affine  on  R(d),  with  its  three  affine  parts 
determined  by  the  affine  projection  x  -»  Aj  x  +  aj. 

The  following  lemmas  provide  helpful  tools  for  constructing  the  affine 
mappings  required  in  a  piecewise  linear  network. 

LEMMA  1.  If  S  =  { Sj }  is  a  set  of  d+1  points  in  general  position  in  R(d)  and  x  =  (zj)  is 
a  vector  of  d+1  reals,  then  there  exists  a  unique  affine  functional 
x  — »  f(x)  =  a  x  +  a,  satisfying  f(sj)  =  zj,  1  <J  <.  d+1.  Here  a  is  a  d-dimensional 
row  vector  and  a  is  a  real  scalar. 

PROOF.  Let  tj  =  Sj  -  Sd+i  and  Vj  =  zj  -  z( j+j  for  1  <  j  <  d.  Now  let 
T  =  Hi,  t2 . tdl 

and 

v  =  (vj,  V2,  ....  vq)  . 

Since  S  is  in  general  position,  the  tj's  form  a  basis.  Thus,  T  is  nonsingular  and 
the  equation  yT  »  v  has  a  unique  solution  y  =  a  in  R(d).  Then  the  desired  affine 
functional  is  f(x)  =  ax  +  a,  where  a  =  zd+i  -  asd+i-  Uniqueness  of  f  also  follows 
from  the  nonsingularity  of  T. 

Lemma  1  is  the  affine  version  of  the  fact  that  the  linear  functional  mapping  d 
independent  vectors  in  d-space  onto  a  prescribed  set  of  d  numbers  exists  and  is 
unique. 

LEMMA  2.  Suppose  X  is  a  finite  set  in  general  position  in  R^,  S  =  [si  S2 . Sd }  is 

a  d-subset  of  X,  K  is  a  positive  real,  and  z  =  (zj,  Z2,  ....  zd)  is  a  real  d-vector.  Then 
there  exists  an  affine  functional  f  :  x  -»  ax  +  a,  satisfying 

f(sj)  =  zj,  for  1  <  j  <  d 

and 


I  f(x)  I  >.  K,  for  x  e  XNS  . 
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PROOF.  First  note  that  the  result  is  trivial  if  X  =  S.  For  v  ?  S,  X\S  is  nonempty,  and 
we  let  xo  be  a  fixed  member  of  X\S.  Next,  we  apply  Lemma  1  to  the  (d-t-l)-set 
{xo)  U  S  twice:  we  first  obtain  the  affine  functional  g  satisfying 

g(sj)  =  zj  for  1  <  j  <  d 

g(xo)  =  0 

and  then  obtain  the  affine  functional  h  satisfying 
h(sj)  =  0  for  1  <  j  <  d 
h(xo) =  1  . 

The  desired  affine  functional  is  f  =  g  +  Mh,  where 
(K  +  lg(x)l  J 

M=maxl  i  nm' :xeXXS}- 

Note  that  h  maps  all  d  members  of  S  into  0.  However,  h  is  not  the  zero-mapping 
since  h(xo)  =1.  It  follows  that  the  kernel  of  h  is  the  hyperplane  Ho  containing  S. 
Since  X  is  in  general  position,  no  members  of  X  (other  than  those  in  S)  lie  in  Ho. 
Thus.  h(x)  *  0  for  all  x  e  XNS  and  M  is  well-defined.  The  summands  g  and  Mh  of  f 
have  the  following  properties: 

g  maps  the  members  of  S  into  the  desired  outputs 

Mh  spreads  all  the  members  of  XNS  away  from  0  while  preserving  the 
desired  values  g(sj). 

Repeated  application  of  Lemma  2  to  the  L  real  outputs  of  the  hidden  layer 
allows  one  to  send  some  inputs  to  the  comers  of  the  unit  cube  while  placing  the 
others  at  arbitrary  locations  in  the  unit  cube.  The  combinatorial  configuration  of 
inputs  in  RW)  determines  how  the  points  in  XNS  are  separated  by  the  hyperplane 
through  S.  This  technique  is  best  understood  by  analyzing  the  following 
examples. 

EXAMPLE  1.  We  consider  a  (3,3,2)  network  acting  on  five  points  in  R^.  Let  X  = 
{ x i ,  X2,  X3,  X4,  X5 }  be  a  set  of  inputs  in  general  position  in  R^3^  and  let  Ext(X)  be 
the  set  of  extreme  points  (vertices)  of  Hull(X).  At  least  one  of  the  ten  segments 
joining  pairs  of  points  from  X  meets  the  interior  of  Hull(X).  More  precisely,  if 
I  Ext(X)  I  =  4,  then  there  are  four  such  interior  segments,  while  there  is  only  one 
if  I  Ext(X)  I  =  5.  (This  follows  from  Radon's  Theorem  as  stated  in  the  previous 
section.) 


Without  loss  of  generality  we  may  assume  dial  (x4,  X5}  bounds  an  interior 
segment  of  Hull(X).  Figure  1  shows  two  different  configurations  of  five  points  in 
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R(3)  with  {x4,  X5}  an  interior  segment.  Choose  a  basis  { b  1 ,  b2.  b3 }  for  with 
bj  =  X4  -  X5.  Let  Q  :  R®  ->  R(3>  be  the  linear  transformation  defined  by 

0  for  j  =  1 

Qbj  = 

bj  for  j  =  2  or  3 


1(a) 


*5 


*2 


FIGURE  1.  Two  Configurations  in 
3-Space  With  (X4,  X5)  an  Internal  Edge. 
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and  let  qj  =  Q  xj.  The  mapping  Q  is  a  rank  2  mapping  from  R^)  onto  the  two- 
dimensional  subspace  Range(Q).  The  four  points  qi,  q2.  q3,  q4,  are  distinct 
while  qs  =  q4-  Moreover,  since  {X4,  X5 )  bounds  an  interior  segment  in  Hull(X),  q4 
must  lie  in  the  two-dimensional  interior  of  Huimqj.q2.q3.q4}).  It  follows  that 
q4  lies  inside  the  triangle  bounded  by  Iq1.q2.q3.}-  Figure  2  shows  the 
configuration  of  q's  in  R^2).  The  purpose  of  analyzing  the  image  of  X  under  Q  is 
to  select  hyperplanes  in  R{3)  for  application  of  Lemma  2.  The  three  lines  in 
Range(Q)  joining  q4  to  qj,  q2,  and  q3  correspond  to  the  three  planes  in  R(3) 

determined  by  the  triples  (xj,  X4,  X5,},  (x2,  X4,  X5},  and  {X3,  X4,  X5 } .  The  position 

of  q4  relative  to  {q  1 ,  q2»  <?3 }  determines  how  the  three  hyperplanes  partition  the 
remaining  points.  Indeed  the  partitions  by  the  hyperplanes  are  identical  to 
those  of  the  lines  in  Range(Q).  Let  Lj  denote  the  line  through  q;  and  q4  and  let  Hj 
denote  the  plane  through  xj,  X4,  and  X5,  for  1  <  i  <  3.  The  plane  Hj  separates  the 

remaining  two  x's  in  R(3)  if  and  onh  if  the  line  Lj  separates  the  remaining  two  q's 

in  Range(Q).  Therefore,  each  plane  Hj  separates  the  remaining  two  points  in 
rO).  Moreover,  this  fact  is  a  consequence  of  our  choosing  an  interior  segment  to 
define  Q.  Choice  of  an  exterior  edge  to  define  Q  would  have  resulted  in  only  one 
of  three  hyperplanes  separating  the  remaining  two  points.  This  geometry  forms 
the  basis  for  Example  2. 


Li 


FIGURE  2.  Projection  Into  Plane 
For  Example  1. 


Lemma  2  allows  us  to  define  an  affine  mapping  Aj  corresponding  to  the  plane 
Hj,  1  <  i<  3.  In  each  case,  we  select  the  values  on  the  triple  S  to  lie  in  the  interval 
[-1,1]  and  set  the  lower  bound  K  =  1.  Thus,  of  the  five  outputs  ujj  =  p(a|(xj)),  1  < 
j  <.  5,  of  the  ith  hidden  neuron,  1  <.i<_3,  three  may  be  placed  arbitrarily  in 
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[-1,1],  while  the  other  two  must  be  -1  and  1.  Table  1  lists  the  outputs  Uj  =  (ujj, 
u2j*  u3j)-  1  <J  <L  5  at  the  hidden  layer. 


TABLE  1.  Coordinates  of  uj 
for  Example  1. 


j 

uj 

1 

(uii  -1  1)T 

2 

(1  U22  -1)T 

3 

(-1  1  U33)t 

4 

(U14  U24  U34)T 

5 

(u 1 5  U25  U35)t 

The  nine  variables  uij  in  Table  1  can  be  independently  chosen  in  the  interval 
[-1,1].  This  provides  considerable  flexibility  in  positioning  the  images  uj  of  the 
five  inputs  at  the  hidden  layer  so  as  to  facilitate  the  final  mapping  into  the 
desired  outputs  yj.  Figure  3  shows  the  five  u's  in  R^).  The  points  uj,  U2,  and  U3 
lie  on  edges  of  the  cube.  This  follows  from  the  fact  that  each  of  them  has  one 
variable  coordinate,  as  shown  in  Table  1.  Similarly,  U4  and  U5  each  have  all  three 
coordinates  variable,  which  allows  them  to  be  placed  anywhere  in  the  cube. 
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(-++) 


(+++) 


(++-) 


FIGURE  3.  Five  Points  uj  of  Example  1  in  the  3-Cube; 
U4,  uj  Can  Be  Anywhere. 


EXAMPLE  2.  Again,  we  consider  a  (3,3,2)  network  acting  on  the  set  X  =  (xj,  x2,  X3, 
X4,  X5)  in  R^3).  At  least  six  of  the  ten  segments  joining  pairs  of  points  of  X  must 
be  edges  in  the  boundary  of  Hull(X).  In  this  example,  we  assume  that  (X4.  X5) 
determines  an  exterior  edge.  Defining  bj,  qj,  and  Q  as  in  Example  1  makes  q4  an 

extreme  point  of  the  boundary  of  Hull(qj,  q2<  q3,  q4).  It  follows  that  only  one  of 

the  three  lines  through  the  pairs  (qj,  q4),  (q2.  q4>,  and  (qj,  q4)  separates  the 
remaining  two  q’s.  We  may  assume  that  (q2,  q4)  is  the  line.  Let  Hj  be  a  plane  in 
R(3),  which  intersects  HuIl(X)  only  in  the  line  joining  X4  and  X5.  Such  a  plane 
exists  because  (X4,  X5)  is  an  exterior  edge  of  Hull(X).  Next,  we  let  H2  and  H3  be 

the  planes  in  R^)  through  (xj,  X4,  X5)  and  (X3,  X4,  X5),  respectively.  The  plane  Hj 

corresponds  to  the  line  Lj;  the  three  lines  Lj  are  shown  in  Figure  4. 

Applying  Lemma  2,  we  again  obtain  a  mapping  a|  corresponding  to  the  plane 
Hi.  1  <.  i  <.  3.  Table  2  shows  the  five  outputs  uj  of  the  hidden  layer,  where 
u ij  =  p(A*  (xj)),  1  <  j  <  5,  as  in  Example  1. 


1  0 
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FIGURE  4.  Projection  Into  Plane  of 
Example  2. 

In  this  geometry,  eight  variables  ujj  in  Table  2  can  be  chosen  independently 
in  the  interval  [-1,1].  Figure  5  shows  the  five  u's  in  R(3);  U2  lies  at  the  comer 
(1,  1,  -1)T;  u i  and  U3  lie  on  edges;  and  114  and  U5  can  be  anywhere.  A  critical 
feature  of  this  geometry  is  that  ui,  U2,  and  U3  can  be  placed  arbitrarily  close 
together  by  letting  U21  approach  1  and  U33  approach  -1. 


TABLE  2.  Coordinates  of  uj 
for  Example  2. 


j 

uj 

1 

(1  U21  -1) 

2 

(1  1  -1)T 

3 

(1  1  U33) 

4 

(U14  U24  U34) 

5 

(U15  U25  U35) 

1  1 
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(-++)  (+++) 


FIGURE  5.  Five  Points  Uj  of  Example  2  in  the  3-Cube; 
U4,  U5  Can  Be  Anywhere. 


WEIGHT  ASSIGNMENT  ALGORITHM 


The  geometry  of  Example  2  is  the  basis  for  the  algorithm  discussed  in  this 
section.  We  first  present  a  brief  description  of  the  algorithm.  Next,  the 
algorithm  will  be  applied  to  example  2,  and  finally  the  general  algorithm  will  be 
described  in  detail. 


Suppose  that  d,  m,  and  N  are  given,  i.e.,  we  require  a  mapping  from  to 
R(m),  which  accommodates  N  given  input/output  pairs.  The  first  task  is  to  choose 
L.  The  following  cases  arise  when  considering  how  large  L  must  be. 


Case 

1. 

N  <  d  +  1. 

Let  L  =  min  [d,  m]. 

Case 

2. 

d  +  2<N<d  +  3  m/2  - 

1. 

Let  L  =  3n,  with  n  =|m/2|. 

Case 

3. 

d  +  3m/2  <  N  and  m  > 
Let  L  =  3n,  with  n  =[(N 

2. 

-  d  +  m\ 

Case 

4. 

d  +  3  m/2  <  N  and  m  = 

i. 

L 
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In  Case  1,  N  is  small  enough  to  employ  the  methods  of  Reference  3.  The 
algorithm  presented  here  is  intended  for  Cases  2,  3,  and  4.  When  L2  <  Lj  in  Case 
4,  a  method  presented  in  Reference  4  is  used.  In  Cases  2,  3,  and  4,  the 
inequalities 

N  <  d  +  L  -  1 


and 


3  m  <  2L 

both  hold.  These  are  the  only  requirements  for  the  algorithm. 

L  =  3n  in  both  Cases  2  and  3.  If  3m  <  2L,  we  increase  m  to  2L/3,  adding  new 
coordinates  to  each  output  while  maintaining  general  position  in  R<m).  Likewise, 
we  add  new  input/output  pairs,  if  necessary,  to  achieve  N  =  d  +  L  -  1.  These 
modifications  enable  us  to  assume  that  N  =  d  +  L-  1,  L  =  3n  and  m  =  2n.  In  the 
presence  of  these  assumptions,  the  input  to  the  algorithm  consists  of 

(11)  Network  parameters  d,  L,  m,  with  L  =  3n 

(12)  Set  X  of  N  input  points  in  general  position  in  R^d\  N  =  d  +  L  -  1 

(13)  Set  Y  of  N  desired  outputs  in  general  position  in  R(m>. 

We  assume,  of  course,  that  yj  is  the  desired  output  for  xj.  Hence,  we  seek  a  set  W 
of  weights  for  the  (d,  L,  m)  network  satisfying 

Fw(xj)  =  yj  for  1  <  j  <  N  . 

The  algorithm  proceeds  as  follows.  First  one  must  determine  a  facet  of  a  facet 
of  Hull(X);  i.e.,  a  subset  S  of  X  for  which  there  exists  a  facet  F  of  X  satisfying  S  c  F, 
and  1SI  =  d  -  1.  A  facet  F  of  X  must  be  a  d-subset,  so  a  facet  of  F  must  be  a 
(d-l)-subset.  In  Figure  1(a)  all  of  the  pairs  of  points  except  (X4,  X5)  are  facets  of 
facets.  For  example,  (xi,  X2,  X4)  is  a  facet  of  Hull(X)  and  (xi,  X2)  is  a  facet  of  (xj, 
X2,  *4)- 


Assume,  without  loss  of  generality,  that 
S  =  (X3n  +  1,  X3n  +  2 . *N>  • 

Let  G  be  the  (d-2)-dimensional  hyperplane  through  S  and  let  P  be  the  2- 
dimensional  linear  subspace  of  R^  perpendicular  to  G.  Let  Q  be  the  orthogonal 
projection  from  RW  to  P.  Q  maps  S  onto  a  single  point  s  in  P. 

Since  S  is  a  (d-2)-face  of  Hull(X),  there  is  a  hyperplane  Ho  through  S  (which 
must  contain  G),  which  does  not  separate  X\S.  Ho  intersects  P  in  a  line  Lo 
through  s,  which  does  not  separate  QX\{s}  in  P.  Thus,  there  is  a  linear  ordering 
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on  the  members  of  QX\{s)  determined  by  the  angles  between  Lo  and  the  vectors 

Qxj  -  s  in  P.  We  assume  that  (xj,  X2 .  xl)  is  the  linear  order.  It  should  be  noted 

here  that  a  cyclic  ordering  is  always  induced  on  a  set  in  the  plane  by  specifying  a 
center  point.  It  is  only  when  the  center  point  is  an  extreme  point  of  the  hull  that 
the  'endpoints'  are  uniquely  defined. 

The  significance  of  the  linear  ordering  is  the  following.  The  hyperpiane  Hj 
through  { xj }  U  S  decomposes  X\(  { x j }  U  S)  into 

{xi,  X2,  ....  xj.]}  U  (xj+i,  xj+2,  ....  XN-d+l) 
for  1  <  j  <  N  -  d  +  1. 

Each  of  the  hyperplanes  Hj,  0<  j  <.  N  -  d  +  1,  yields  an  affine  functional  via 
Lemma  2.  Acting  on  the  N  points  of  X  with  these  functionals  and  'squashing' 
gives  the  N  -  d  +  2  coordinates  for  each  of  the  N  points  shown  in  Table  3.  The 
image  of  Xj  in  (N  -  d  +  l)-space  is  uj.  These  points  can  be  considered  the  output 
at  a  hidden  layer  containing  N  -  d  +  2  neurons.  Those  coordinates  that  can 
assume  any  value  in  [-1,  1]  are  marked 

Our  algorithm  does  not  use  all  of  the  N-d+2  columns  in  Table  3  as  neuron 
coordinates  at  the  hidden  layer.  The  appropriate  array  of  coordinates  is 
constructed  by  deleting  columns  numbered  3k- 1,  1  <  k  <.  n,  and  duplicating  the 
columns  numbered  3k,  1  <.  k  <.  n-1.  This  leaves  an  array  with  L  columns 
corresponding  to  the  L  hidden  neurons.  One  additional  modification  is 
required.  The  variable  entry  in  one  member  of  each  pair  of  duplicated  columns 
is  fixed.  The  effect  of  this  construction  is  the  clustering  of  triples  of  outputs  at  n 
comers  of  the  cube,  n  =  L/3.  The  transpose  of  the  L  by  N  array  of  hidden  layer 
coordinates  is  shown  in  Table  4  with  the  columns  relabeled  1  to  L.  The  rows  are 
partitioned  into  triples  to  illustrate  the  clustering. 
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TABLE  3.  Coordinates  of  X  in  (N-d+2)-Space. 


j 

T 

Uj 

Ho 

Hi 

h2 

H3 

h4 

HN-d-l 

HN-d 

HN-d+l 

1 

1 

* 

-1 

-1 

-1 

.  -  . 

-  1 

-  1 

-1 

2 

1 

1 

* 

- 1 

-1 

.  .  . 

- 1 

-  1 

-1 

3 

1 

1 

1 

* 

-1 

.  .  . 

- 1 

-  1 

-1 

4 

1 

1 

1 

1 

* 

.  .  . 

-1 

-1 

-1 

N-d 

1 

1 

1 

1 

1 

1 

4c 

-1 

N-d+1 

1 

1 

1 

1 

1 

.  .  . 

1 

l 

4c 

N-d+2 

* 

* 

4c 

* 

4c 

♦  .  . 

4c 

4c 

4c 

N-d+3 

* 

* 

* 

* 

4c 

.  .  . 

4c 

4c 

4c 

N-l 

* 

* 

★ 

4c 

4c 

4t 

4c 

4c 

N 

* 

* 

* 

4c 

4c 

.  .  . 

4c 

4c 

4c 

♦Can  be  anything  in  [-1,  1]. 
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TABLE  4.  Coordinates  of  X  in  L-Space. 


j 

T 

UJ 

1 

2 

3 

B 

5 

6 

B 

L-l 

L 

1 

1 

* 

-1 

-i 

-1 

-1 

- 1 

- 1 

-  1 

2 

1 

1 

-1 

-i 

-1 

-1 

-i 

B 

-1 

3 

1 

1 

* 

-i 

- 1 

-1 

-i 

-1 

4 

1 

1 

1 

i 

* 

-1 

•  •  • 

-i 

- 1 

-1 

5 

1 

1 

1 

i 

1 

-1 

.  .  . 

-i 

- 1 

- 1 

6 

1 

1 

1 

i 

1 

* 

-i 

-i 

-1 

L-3 

1 

1 

1 

i 

1 

I 

-i 

-i 

-1 

L-2 

1 

1 

1 

i 

1 

1 

■ 

i 

* 

-1 

L-l 

1 

1 

1 

i 

1 

1 

■ 

i 

i 

-1 

L 

1 

1 

1 

i 

1 

1 

1 

i 

i 

* 

L+l 

* 

* 

♦ 

♦ 

* 

* 

•  •  • 

♦ 

♦ 

* 

L+2 

* 

♦ 

* 

♦ 

* 

* 

•  ■  • 

* 

* 

* 

N-l 

* 

* 

* 

* 

* 

* 

* 

* 

* 

N 

* 

* 

* 

* 

* 

* 

* 

* 

* 

*Can  be  anything  in  [-1,  1]. 

Next,  we  replace  every  *  in  entry  (3k-2,  3k-l)  with  the  value  1  -  5,  0  <  5  <  1. 
Similarly,  the  *  in  entry  (3k,  3k)  is  replaced  by  -1  +  8.  The  rows  of  the  resulting 
T 

array  are  denoted  Uj  (8).  The  following  equations  now  hold  among  the  uj(8): 
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3k-J _  _3.B:3k+l 

I  II  1 

U3k-l(S)  =  O.  1.  ....  1,  -1,  -1 . -UT 

U3k-2(8)  =  U3k.i  -  5e3ic-i 
U3k(8)  =  U3k-i  +  Se3k  , 

where  113k- 1  =  U3k-i(l)  and  ej  is  the  jth  elementary  vector, 

_ U _  _ U _ 

I  II  I 

ej  =  (0,  0,  ....  0,  1,  0,  0 . 0)T  . 

As  8  approaches  0,  each  of  the  uj(8  )  approaches  one  of  the  comers  113k- 1-  The 
following  decomposition  of  R(k)  is  critical  to  the  algorithm: 

R(L)  =  u  +  E, 


where  (112,  115,  U3n-l)  is  a  basis  for  U  and  (e2,  e3,  e5,  e^,  ....  C3n-].  e3n)  is  a 
basis  for  E.  Here  we  let  U3k-i  =  113k- 1(8  ).  since  113k- 1(8 )  is  independent  of  8,  1  <_k 
<  n.  For  all  8  >0,  the  uj(8)'s  form  a  basis  for  R(l).  Therefore,  there  exists  a  linear 
transformation  T(8  )  that  maps  every  uj(8)  into  yj,  1  <  j<  3n.  T(8)  decomposes 
naturally  into  the  following  sum: 

T(S)  =  Ti+sT2  . 

The  summands  Tj  and  T2  are  defined  by 


u3k-l  Y3k-1 

Tj  :  63^!  -*  0  ”  for  1  <  k  <  n 

e3k  0 


u3k-l  0 

T2  :  e3k-l  •“*  Y3k-1  ~  Y3k-2  * 

e3k  Y3k  ~  Y3k-1 


for  1  ^  k  <  n  . 


Of  course,  the  objective  of  the  algorithm  is  to  enable  a  mapping  that  also  sends  uj 
into  yj  for  3n+l  <  j<  N.  The  d-1  remaining  points  uj  can  be  placed  anywhere  in 
the  L-cube.  Thus,  it  suffices  to  choose  8  so  as  to  guarantee  a  preimage  in  the 
L-cube  for  every  yj,  3n+l  <  j  <  N. 
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Since  the  yj's  are  in  general  position  in  R(m),  the  differences  yjk-2  -  y 3k- 1  and 
y3k-l  -  y3k<  1  <  k<.  n,  are  linearly  independent.  It  follows  that  rank  02)  =  m  = 
2n.  Thus,  the  minimum  singular  value  of  T2  I  E  is  positive: 

a  =  a  (T2 1  E)  >  0  . 

min 

From  this  it  also  follows  that 

II  T2V  II  >  o  II  v  II  for  all  v  e  E  . 


We  choose  8=  c/m^x  [II  yj  II].  This  choice  of  8  yields  a  transformation  T(8)  that 

admits  preimages  in  the  L-cube  for  all  yj,  3n+l  <  j<  N.  Indeed  preimages  exist  in 
the  intersection  of  E  with  the  L-cube. 

For  3n+l  <.j  <.  N,  choose  Uj  e  E  such  that  T2(uj)  =  8yj.  This  is  possible  since 
T2  I  E  is  a  bijection  E  R(m).  We  have 

II  8  yj  II  =  II  T2(uj)  II  >  ct  I!  Uj  II  . 

From  this  it  follows  that 
8 

11  uj  11  <  5 11  yj  11  <  1  - 

Thus,  uj  lies  in  the  L-cube.  Finally  for  3n+l,  <  j  <  N, 

T(8)(uj)  =  (Tl  +8"  T2)(uj) 

=  }  T2(uj)  =  yj . 


In  summary,  the  determination  of  the  weights  proceeds  as  follows: 

( 1 )  Inputs 

(1.1)  d,  L,  m  satisfying  N  =  d  +  L  -  1,  L  =  3n  and  m  =  2n. 

(1.2)  N  pairs  (xj,  yj).  The  sets  X  =  { xj }  and  Y  =  {yj }  are  in  general  position  in 

R(d)  and  R^m),  respectively. 

(2)  Find  a  facet  F  of  Hull(X)  and  select  a  (d-l)-subset  S  of  F.  Let  G  be  the 

(d-2)-dimensional  affine  subspace  through  S  and  let  P  be  the  2-dimensional 
linear  subspace  orthogonal  to  G.  Let  Q  denote  the  projection  of  X  onto  P.  The 

entire  set  S  projects  onto  a  point  s  in  P.  Thus,  IQI  =  N-  d  +  2  =  3n  +  l.  Moreover, 

s  is  an  extreme  point  of  Hull(Q)  in  P.  Let  K  be  a  directed  line  through  s  that  does 
not  separate  Q\{s}.  For  each  q  in  Q\{s>},  the  vector  q  -  s  makes  some  angle  0(q) 

1  8 
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with  K,  0  <  0(q)  <  it.  This  orders  the  q's.  Relabel  the  pairs  (xj,  yj)  so  that  S  = 
U3n+1.  x3n+2 . XN)  and  0(qi)  <  0(q2)  <  •••  <  0(q3n>- 

(3)  Determine  the  transformations  Tj  and  T2  satisfying 

Tj  +  T2  '■  uj  — >Yj  ,  1  <_  j  <.  3n 

Ti  I  E  =  0  and  T2  I  U  =  0  . 


Compute  the  minimum  singular  value,  a,  of  T  2  I  E  and  set  8  =  o/max  [II  y}  II], 

(4)  Compute  the  preimage  uj  e  E  of  8yj  under  T2  for  3n+l  <  j  <  N: 
uj  =  S(T2  I  E^yj 

for  3n+l  <  j  <  N.  At  this  point  all  of  the  uj’s  are  known.  For  1  <.  j  <  3n,  they  are 
the  uj(5)’s. 

(5)  Finally,  compute  the  3n  affine  functionals  fj  (guaranteed  by  Lemma  2)  that 
map  the  xj's  into  the  uj's.  The  first  layer  of  weights  is  determined  by  the  fj’s, 

while  the  second  layer  is  determined  by  Ti  +yT2. 
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